252 research outputs found

    Calibrating the Performance of SNP Arrays for Whole-Genome Association Studies

    Get PDF
    To facilitate whole-genome association studies (WGAS), several high-density SNP genotyping arrays have been developed. Genetic coverage and statistical power are the primary benchmark metrics in evaluating the performance of SNP arrays. Ideally, such evaluations would be done on a SNP set and a cohort of individuals that are both independently sampled from the original SNPs and individuals used in developing the arrays. Without utilization of an independent test set, previous estimates of genetic coverage and statistical power may be subject to an overfitting bias. Additionally, the SNP arrays' statistical power in WGAS has not been systematically assessed on real traits. One robust setting for doing so is to evaluate statistical power on thousands of traits measured from a single set of individuals. In this study, 359 newly sampled Americans of European descent were genotyped using both Affymetrix 500K (Affx500K) and Illumina 650Y (Ilmn650K) SNP arrays. From these data, we were able to obtain estimates of genetic coverage, which are robust to overfitting, by constructing an independent test set from among these genotypes and individuals. Furthermore, we collected liver tissue RNA from the participants and profiled these samples on a comprehensive gene expression microarray. The RNA levels were used as a large-scale set of quantitative traits to calibrate the relative statistical power of the commercial arrays. Our genetic coverage estimates are lower than previous reports, providing evidence that previous estimates may be inflated due to overfitting. The Ilmn650K platform showed reasonable power (50% or greater) to detect SNPs associated with quantitative traits when the signal-to-noise ratio (SNR) is greater than or equal to 0.5 and the causal SNP's minor allele frequency (MAF) is greater than or equal to 20% (N = 359). In testing each of the more than 40,000 gene expression traits for association to each of the SNPs on the Ilmn650K and Affx500K arrays, we found that the Ilmn650K yielded 15% times more discoveries than the Affx500K at the same false discovery rate (FDR) level

    Two-stage analyses of sequence variants in association with quantitative traits

    Get PDF
    We propose a two-stage design for the analysis of sequence variants in which a proportion of genes that show some evidence of association are identified initially and then followed up in an independent data set. We compare two different approaches. In both approaches the same summary measure (total number of minor alleles) is used for each gene in the initial analysis. In the first (simple) approach the same summary measure is used in the analysis of the independent data set. In the second (alternative) approach a more specific hypothesis is formed for the second stage; the summary measure used is the count of minor alleles in only those variants that in the initial data showed the same direction of association as was seen overall. We applied the methods to the simulated quantitative traits of Genetic Analysis Workshop 17, blind to the simulation model, and then evaluated their performance once the underlying model was known. Performance was similar for most genes, but the simple strategy considerably out-performed the alternative strategy for one gene, where most of the effect was due to very rare variants; this suggests that the alternative approach would not be advisable when the effect is seen in very rare variants. Further simulations are needed to investigate the potential superior power of the alternative method when some variants within a gene have opposing effects. Overall, the power to detect associations was low; this was also true when using a more powerful joint analysis that combined the two stages of the study

    Effectiveness of strategies to increase the validity of findings from association studies: size vs. replication

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The capacity of multiple comparisons to produce false positive findings in genetic association studies is abundantly clear. To address this issue, the concept of false positive report probability (FPRP) measures "the probability of no true association between a genetic variant and disease given a statistically significant finding". This concept involves the notion of prior probability of an association between a genetic variant and a disease, making it difficult to achieve acceptable levels for the FPRP when the prior probability is low. Increasing the sample size is of limited efficiency to improve the situation.</p> <p>Methods</p> <p>To further clarify this problem, the concept of true report probability (TRP) is introduced by analogy to the positive predictive value (PPV) of diagnostic testing. The approach is extended to consider the effects of replication studies. The formula for the TRP after k replication studies is mathematically derived and shown to be only dependent on prior probability, alpha, power, and number of replication studies.</p> <p>Results</p> <p>Case-control association studies are used to illustrate the TRP concept for replication strategies. Based on power considerations, a relationship is derived between TRP after k replication studies and sample size of each individual study. That relationship enables study designers optimization of study plans. Further, it is demonstrated that replication is efficient in increasing the TRP even in the case of low prior probability of an association and without requiring very large sample sizes for each individual study.</p> <p>Conclusions</p> <p>True report probability is a comprehensive and straightforward concept for assessing the validity of positive statistical testing results in association studies. By its extension to replication strategies it can be demonstrated in a transparent manner that replication is highly effective in distinguishing spurious from true associations. Based on the generalized TRP method for replication designs, optimal research strategy and sample size planning become possible.</p

    A p53-regulated apoptotic gene signature predicts treatment response and outcome in pediatric acute lymphoblastic leukemia

    Get PDF
    Russell O Bainer,1 Matthew R Trendowski,2 Cheng Cheng,3 Deqing Pei,3 Wenjian Yang,3 Steven W Paugh,4 Kathleen H Goss,5 Andrew D Skol,6 Paul Pavlidis,7 Ching-Hon Pui,4,8 T Conrad Gilliam,1 William E Evans,4,9,* Kenan Onel10&ndash;13,* 1Department of Human Genetics, 2Department of Medicine, Section of Hematology/Oncology, The University of Chicago, Chicago, IL, 3Department of Biostatistics, 4Hematological Malignancy Program, St Jude Children&rsquo;s Research Hospital, Memphis, TN, 5University of Chicago Medicine Comprehensive Cancer Center, 6Department of Pediatrics, The University of Chicago, Chicago, IL, USA; 7Department of Psychiatry, University of British Columbia, Vancouver, BC, Canada; 8Department of Oncology, 9Department of Pharmaceutical Sciences, St Jude Children&rsquo;s Research Hospital, Memphis, TN, 10Division of Human Genetics and Genomics, 11Division of Hematology/Oncology and Stem Cell Transplantation, Cohen Children&rsquo;s Medical Center, New Hyde Park, 12The Feinstein Institute for Medical Research, Manhasset, NY, 13Hofstra Northwell School of Medicine, Hofstra University, Hempstead, NY, USA *These authors contributed equally to this work Abstract: Gene signatures have been associated with outcome in pediatric acute lymphoblastic leukemia (ALL) and other malignancies. However, determining the molecular drivers of these expression changes remains challenging. In ALL blasts, the p53 tumor suppressor is the primary regulator of the apoptotic response to genotoxic chemotherapy, which is predictive of outcome. Consequently, we hypothesized that the normal p53-regulated apoptotic response to DNA damage would be altered in ALL and that this alteration would influence drug response and treatment outcome. To test this, we first used global expression profiling in related human B-lineage lymphoblastoid cell lines with either wild type or mutant TP53 to characterize the normal p53-mediated transcriptional response to ionizing radiation (IR) and identified 747 p53-regulated apoptotic target genes. We then sorted these genes into six temporal expression clusters (TECs) based upon differences over time in their IR-induced p53-regulated gene expression patterns, and found that one cluster (TEC1) was associated with multidrug resistance in leukemic blasts in one cohort of children with ALL and was an independent predictor of survival in two others. Therefore, by investigating p53-mediated apoptosis in vitro, we identified a gene signature significantly associated with drug resistance and treatment outcome in ALL. These results suggest that intersecting pathway-derived and clinically derived expression data may be a powerful method to discover driver gene signatures with functional and clinical implications in pediatric ALL and perhaps other cancers as well. Keywords: pediatric acute lymphoblastic leukemia, p53, gene expression signature, outcomes analysi

    Power analysis for genome-wide association studies

    Get PDF
    Abstract Background Genome-wide association studies are a promising new tool for deciphering the genetics of complex diseases. To choose the proper sample size and genotyping platform for such studies, power calculations that take into account genetic model, tag SNP selection, and the population of interest are required. Results The power of genome-wide association studies can be computed using a set of tag SNPs and a large number of genotyped SNPs in a representative population, such as available through the HapMap project. As expected, power increases with increasing sample size and effect size. Power also depends on the tag SNPs selected. In some cases, more power is obtained by genotyping more individuals at fewer SNPs than fewer individuals at more SNPs. Conclusion Genome-wide association studies should be designed thoughtfully, with the choice of genotyping platform and sample size being determined from careful power calculations.</p

    The systemic lupus erythematosus IRF5 risk haplotype is associated with systemic sclerosis

    Get PDF
    Systemic sclerosis (SSc) is a fibrotic autoimmune disease in which the genetic component plays an important role. One of the strongest SSc association signals outside the human leukocyte antigen (HLA) region corresponds to interferon (IFN) regulatory factor 5 (IRF5), a major regulator of the type I IFN pathway. In this study we aimed to evaluate whether three different haplotypic blocks within this locus, which have been shown to alter the protein function influencing systemic lupus erythematosus (SLE) susceptibility, are involved in SSc susceptibility and clinical phenotypes. For that purpose, we genotyped one representative single-nucleotide polymorphism (SNP) of each block (rs10488631, rs2004640, and rs4728142) in a total of 3,361 SSc patients and 4,012 unaffected controls of Caucasian origin from Spain, Germany, The Netherlands, Italy and United Kingdom. A meta-analysis of the allele frequencies was performed to analyse the overall effect of these IRF5 genetic variants on SSc. Allelic combination and dependency tests were also carried out. The three SNPs showed strong associations with the global disease (rs4728142: P = 1.34×10&lt;sup&gt;−8&lt;/sup&gt;, OR = 1.22, CI 95% = 1.14–1.30; rs2004640: P = 4.60×10&lt;sup&gt;−7&lt;/sup&gt;, OR = 0.84, CI 95% = 0.78–0.90; rs10488631: P = 7.53×10&lt;sup&gt;−20&lt;/sup&gt;, OR = 1.63, CI 95% = 1.47–1.81). However, the association of rs2004640 with SSc was not independent of rs4728142 (conditioned P = 0.598). The haplotype containing the risk alleles (rs4728142*A-rs2004640*T-rs10488631*C: P = 9.04×10&lt;sup&gt;−22&lt;/sup&gt;, OR = 1.75, CI 95% = 1.56–1.97) better explained the observed association (likelihood P-value = 1.48×10&lt;sup&gt;−4&lt;/sup&gt;), suggesting an additive effect of the three haplotypic blocks. No statistical significance was observed in the comparisons amongst SSc patients with and without the main clinical characteristics. Our data clearly indicate that the SLE risk haplotype also influences SSc predisposition, and that this association is not sub-phenotype-specific

    PGA: power calculator for case-control genetic association analyses

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Statistical power calculations inform the design and interpretation of genetic association studies, but few programs are tailored to case-control studies of single nucleotide polymorphisms (SNPs) in unrelated subjects.</p> <p>Results</p> <p>We have developed the "Power for Genetic Association analyses" (PGA) package which comprises algorithms and graphical user interfaces for sample size and minimum detectable risk calculations using SNP or haplotype effects under different genetic models and study constrains. The software accounts for linkage disequilibrium and statistical multiple comparisons. The results are presented in graphs or tables and can be printed or exported in standard file formats.</p> <p>Conclusion</p> <p>PGA is user friendly software that can facilitate decision making for association studies of candidate genes, fine-mapping studies, and whole-genome scans. Stand-alone executable files and a Matlab toolbox are available for download at: <url>http://dceg.cancer.gov/bb/tools/pga</url></p

    Multiethnic Genetic Association Studies Improve Power for Locus Discovery

    Get PDF
    To date, genome-wide association studies have focused almost exclusively on populations of European ancestry. These studies continue with the advent of next-generation sequencing, designed to systematically catalog and test low-frequency variation for a role in disease. A complementary approach would be to focus further efforts on cohorts of multiple ethnicities. This leverages the idea that population genetic drift may have elevated some variants to higher allele frequency in different populations, boosting statistical power to detect an association. Based on empirical allele frequency distributions from eleven populations represented in HapMap Phase 3 and the 1000 Genomes Project, we simulate a range of genetic models to quantify the power of association studies in multiple ethnicities relative to studies that exclusively focus on samples of European ancestry. In each of these simulations, a first phase of GWAS in exclusively European samples is followed by a second GWAS phase in any of the other populations (including a multiethnic design). We find that nontrivial power gains can be achieved by conducting future whole-genome studies in worldwide populations, where, in particular, African populations contribute the largest relative power gains for low-frequency alleles (<5%) of moderate effect that suffer from low power in samples of European descent. Our results emphasize the importance of broadening genetic studies to worldwide populations to ensure efficient discovery of genetic loci contributing to phenotypic trait variability, especially for those traits for which large numbers of samples of European ancestry have already been collected and tested

    Design Considerations for Massively Parallel Sequencing Studies of Complex Human Disease

    Get PDF
    Massively Parallel Sequencing (MPS) allows sequencing of entire exomes and genomes to now be done at reasonable cost, and its utility for identifying genes responsible for rare Mendelian disorders has been demonstrated. However, for a complex disease, study designs need to accommodate substantial degrees of locus, allelic, and phenotypic heterogeneity, as well as complex relationships between genotype and phenotype. Such considerations include careful selection of samples for sequencing and a well-developed strategy for identifying the few “true” disease susceptibility genes from among the many irrelevant genes that will be found to harbor rare variants. To examine these issues we have performed simulation-based analyses in order to compare several strategies for MPS sequencing in complex disease. Factors examined include genetic architecture, sample size, number and relationship of individuals selected for sequencing, and a variety of filters based on variant type, multiple observations of genes and concordance of genetic variants within pedigrees. A two-stage design was assumed where genes from the MPS analysis of high-risk families are evaluated in a secondary screening phase of a larger set of probands with more modest family histories. Designs were evaluated using a cost function that assumes the cost of sequencing the whole exome is 400 times that of sequencing a single candidate gene. Results indicate that while requiring variants to be identified in multiple pedigrees and/or in multiple individuals in the same pedigree are effective strategies for reducing false positives, there is a danger of over-filtering so that most true susceptibility genes are missed. In most cases, sequencing more than two individuals per pedigree results in reduced power without any benefit in terms of reduced overall cost. Further, our results suggest that although no single strategy is optimal, simulations can provide important guidelines for study design

    A functional polymorphism in the SPINK5 gene is associated with asthma in a Chinese Han Population

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Mutation in <it>SPINK5 </it>causes Netherton syndrome, a rare recessive skin disease that is accompanied by severe atopic manifestations including atopic dermatitis, allergic rhinitis, asthma, high serum IgE and hypereosinophilia. Recently, single nucleotide polymorphism (SNP) of the <it>SPINK5 </it>was shown to be significantly associated with atopy, atopic dermatitis, asthma, and total serum IgE. In order to determine the role of the <it>SPINK5 </it>in the development of asthma, a case-control study including 669 asthma patients and 711 healthy controls in Han Chinese was conducted.</p> <p>Methods</p> <p>Using PCR-RFLP assay, we genotyped one promoter SNP, -206G>A, and four nonsynonymous SNPs, 1103A>G (Asn368Ser), 1156G>A (Asp386Asn), 1258G>A (Glu420Lys), and 2475G>T (Glu825Asp). Also, we analyzed the functional significance of -206G>A using the luciferase reporter assay and electrophoresis mobility shift assay.</p> <p>Results</p> <p>we found that the G allele at SNP -206G>A was associated with increased asthma susceptibility in our study population (p = 0.002, odds ratio 1.34, 95% confidence interval 1.11–1.60). There was no significant association between any of four nonsynonymous SNPs and asthma. The A allele at -206G>A has a significantly higher transcriptional activity than the G allele. Electrophoresis mobility shift assay also showed a significantly higher binding efficiency of nuclear protein to the A allele compared with the G allele.</p> <p>Conclusion</p> <p>Our findings indicate that the -206G>A polymorphism in the <it>SPINK5 </it>is associated with asthma susceptibility in a Chinese Han population.</p
    corecore